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Abstract 

In this work we present a technique of fast numerical computation 
for solutions of Navier-Stokes equations in the case of flows of industrial 
interest. At first the partial differential equations are translated into 
a set of nonlinear ordinary differential equations using the geometrical 
shape of the domain where the flow is developing, then these ODEs are 
numerically resolved using a set of computations distributed among the 
available processors. We present some results from simulations on a 
parallel hardware architecture using native multithreads software and 
simulating a shared-memory or a distributed- memory environment. 

keywords: Navier-Stokes equations, streamlines, nonlinear differen- 
tial equations, parallel computation. 

1 Introduction 

Design, development and engineering of industrial power burners have 
strong mathematical requests, such as numerical resolution of differential 
problems involving Navier-Stokes equations for velocity and pressure fields, 
geometrical design of combustion heads for a correct shape and optimal ef- 
ficiency of flame, geometrical design of ventilation fans and computation of 
correct air inflows for optimal combustion, computation of flows in the inter- 
nal vanes of oil pumps for estimating the correct mass rate and dimensions 
of gears. 

Rapid prototyping for an accurate design of the correct geometries involves 
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a numerical simulation of the gas or oil flows in these burner's components. 
The necessity of an high graphic resolution for the visualization of the flows 
requires the knowledge of a large amount of streamlines. Therefore the nu- 
merical computation is very onerous about amount of memory and power 
of processors of the used hardware environment. A technique for reducing 
the total time of computation can be very useful. In this paper we present a 
method based on reduction of Navier-Stokes system to a set of not coupled 
ordinary differential equations. Each of these equations describes the fluid 
velocity field along each flow streamline. The advantage of this method, 
compared to an usual one such as finite differences scheme (see e.g. [1]), is a 
faster computation of the numerical resolution and an easy parallelization in 
a multicore environment, using a distribution of blocks of streamlines among 
the available processors. 



Figure 1: Flows (gas at central zone) in a burner's combustion head. The swirling 
effects are very important for an optimized combustion, and they are very difficult 
to simulate by standard numerical methods. 



2 The mathematical model 

In this section we briefly present the mathematical model developed for sim- 
bolic and numerical resolution of the differential problems about some flows 
in burner's components. 

The main set of equations is the Navier-Stokes system describing the dy- 
namics of the fluids (see e.g. [2]): 



where v is the velocity field, p is the pressure, p the density and [i the dy- 
namic viscosity of the gas or oil. Sometimes there are other equations to 
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be considered, such as continuity equation, but here we would describe a 
general simplified case. A geometric characterization of the flow is offered 
by the streamlines, that is the continuous lines on which the tangent at any 
point is parallel to the velocity vector. 

At every instant of time, the velocity field in the domain of the flow is the 
set of vectors which give the velocities of the particles at every point {eule- 
rian description of a flow). The computation of this vectorial field is done 
by numerical discretizations of (pQ), e.g. using finite temporal steps, which 
are usually expensive for CPUs and central memory of an hardware envi- 
ronment. 

The fluid motion can be described by the set of all the trajectories followed 
by the particles and the speed along these geometric lines in a given interval 
of time {lagrangian description of a flow). The computation of the trajecto- 
ries usally requires the knowledge of the velocity field, therefore numerical 
methods are expensive too (see [3] for a review on computational fluid dy- 
namics techniques). 

We present a method for computing the velocity field by reduction of Navier- 
Stokes equations to ordinary differential form. 

Let <E> € C k ([ri,rM] X [so,«i] X [ti,tf],M. 3 ) be a smooth (k > 1) function 
such that, if to £ [ii>^e] is a time value between initial ti and final instant 
tj, and 6 [ri,rjy], the curve <f> : s i — ► $(r^,s,to) is a C k ([so, s\], M. 3 ) 
parameterized invertible representation (see [1]) of a streamline at time to. 
The r parameter identifies a single streamline, and in numerical computation 
it takes integer values in interval 1,M, where M is the total number of 
streamlines. Then temporal evolution of the flow is described by the map 
t i — ► 4»(r, s,t), for (r,s) G [n^A/] x [soj^l]- At any to G [U,tf], the map 
(r, s) i — ► $(r, s,to) is a snapshot of all the streamlines at time to- We can 
consider the velocity field of this snapshot as a set of vectors depending only 
from (r, s) and not from t, so that for the instant to + dt the field is computed 
using the new snapshot s i — ► $(r, s, to + dt). 

For fixed to and r^, let (xj)i<j<3 be a cartesian coordinates system in M 3 and 
s the parameter of the representation s i — > <p(s) = $(r<^,s, to)- In general, 
under usual hypothesis of regularity, the relation between a streamline and 
its tangent vector field is 

v A = k 

V 3 <t>3 

where <j){ = Now we define the parameterized velocity field along the 
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streamline as 



u = v o (j) (3) 

so that u = u(r<£, s, to) = u (s)- Using chain rule we have Ui = J2m=i v im4>m 
and, from the following relation holds: 

Ui = {vivn + v 2 v i2 + v 3 v i3 ) — (4) 

Vi 

Note that the expression vivn + v 2^2 + ^3^3 is the i-th component of v • Vv 
in Navier-Stokes system, so it can be substituted by ViUi(4>i)~ l . 
Now let g = <?(x) be the inverse of <p and q = p o 0; using standard compu- 
tations we obtain 
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+ u t — 2 (6) 
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Along the streamline s 1 — ► 4>(s) we have dtv = 9<(u o g) = 0, so Navier- 
Stokes equations can be translated into the following ordinary differential 
equations for the functions (tij)i<j<3: 

dg 

purify = -Q-q^- + ^ (Vg • Vg Hi + Ag in) (8) 

where p and p should be considered as function of the parameter s and the 
derivatives of g should be expressed throught those of (p. These equations are 
again nonlinear, and a possible symbolic resolution required non standard 
analytical methods (as Lie symmetries, see [5]), but now all the derivatives 
are ordinary and the equations are no more coupled. 



3 Example: internal gears pump 

As example of technological interest, we consider an internal gears pump 
for oil, usually formed by a central internal rotor (pignon) which moves an 
external one (crown). 
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Figure 2: The particular case of lobes gear pump. The internal curve is the profile of the 
pignon, the external the profile of crown. The internal zones are full of oil, which is transported 
from suction inlet to delivery outlet. The analytical equations of the curves are of type (a;, y) = 
[r + sin(<fr + u)t)] (cos(n + t), sin(n + t)). 



The oil is contained in the geometrical zones (vanes) between the two 
rotors. The knowledge of the flow in these vanes is important for a compre- 
hension of some critical phenomena as cavitation (bubbles incoming) or loss 
of lubrication. 

We would study the flow in the critical zone of oil compression, where the 
rotors are very near and the vane has a small volume. Here we consider, for 
simplicity, that streamlines have the same geometrical shape of the rotors. 
In the case, e.g., of a classical teeth gears pump, without considering a di- 
mensional factor, a single profile (left side of the tooth) can be described by 
the following representation 



cos(s) + s sin(s) 
sin(s) — s cos(s) 



(9) 



where 7r + arctan(so) < s < |7r, for an opportune value sq. 

The evolution in time can be handled, e.g., by a rotation of angle 9 = ut 
for an interval of time small enough to have a costant shape for the volume 
between the two rotors. Then in this interval the streamlines have a new 
parametrization given by 



$(s,t) = &(a) 



cos(ut) —sin(tot) 
sin(ivt) cos(ujt) 



cf>(s) 



(10) 
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Figure 3: Tooth profile for internal rotor of a standard gear pump. The (O are the equations 
of the left side. 



but the derivatives in the form (JS|) of Navier-Stokes equations are referred 
only to s parameter again. With standard calculations it can be shown that, 
in the simple case of constant pressure, these equations have the nonlinear 
form 



4 The computational algorithm 

For computing a solution of ([8]) , also in case of more complex geometries than 
previous or in case of turbolent flow, we have tested an algorithm based on 
smooth interpolation of streamlines which are computed by a first rough 
numerical resolution of original Navier-Stokes equations. For example, in 
the oil suction channel (see fig. H|) there is a geometric angle where the 
flow can become turbulent and problems of cavitation (formation of air or 
vapours bubbles due to local gradient of pressure) can happen. 
We describe now the algorithm. For a chosen set of time values, a suitable 
number of streamlines are computed using a finite difference scheme on a 
grid covering the geometrical domain of interest (see [2] ) . Then each stream- 
line is divided into couples of geometrical consecutive points, i.e. {Pi,P2}, 
{P2, -P3}, {P/v-i; Pn}- Every couple is interpolated with a cubic polyno- 
mial p = p(t), t € [0, 1] (t is not time), imposing the following four analytical 
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conditions: 



p(0) 
P(l) 

y(o) 



■ffc+i 



(12) 



where 1 < k < N — 1 and u is the numerical solution (velocity field) of the 
original Navier-Stokes system (see [6] for details). In this way we obtain a 
set of class C 1 new streamlines. Note that cubics are the smallest degree 
one variable polinomials useful to have continuous differentiation along the 
interpoled lines. Each new streamline can be represented using, e.g., a 
unique parameter s € [0,1] with the re-par ametrization s = , with 

m EN, < m < N — 1; if / = f(t) is a cubic between two generic points, 
the curve 6 = 6(s) has derivative 



N 



( 1 
dt 



(13) 



and two consecutive cubics have the same derivative in their common point. 
The first order derivative (f) is necessary for the ODE form ([8]) of the Navier- 
Stokes equations. It can be shown (see [6]) that, if M is the number of 
streamlines which we want to interpolate, all the 4 x (N — 1) X M coeffi- 
cients of the cubics can be computed by N — 1 matrix-vector products Ab 
where A is the 4M x AM matrix 
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and b is the vector b = (p(i, k ),P(i,k+i), ■~,V(M,k),V(M,k+i)), 1 < k < N - 1. 
Note that A is a sparse matrix with density number (see [7]) at most j^. 
Also, A is a block diagonal matrix where each block is constant, so the 
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Figure 4: Suction zone for lobe pump. Oil flows from left horizontal channel and is distributed, 
by drop of pressure, among the lobes into the pump. 



execution of a product like Ab can be optimized using standard techniques 
(seed). 

5 About numerical resolution 

In this section we discuss the numerical resolution of the nonlinear differen- 
tial equations dH). 

For physical reasons, as typical in eulerian point of view, we might resolve an 
initial value problem for every streamline and for every instant of discretized 
time, imposing two known values no and u\ for u at s = so and s = s\ and 
two known values iiQ and ii\ for u at s = sq and s = s\ respectively. The 
amount of total computations for this method depends on the number of 
streamlines, on the number of time values considered, and on the spatial 
step size (that is, degree of discretization for the independent variable in- 
terval [so,si]) of the algorithm used for the resolution of a single equation. 
In addition, for complex geometries or turbulent flows, there is the rough- 
resolution of the classical Navier-Stokes system and then the interpolations 
of the streamlines so found. 

Using an hardware environment for numerical computations, we have tried 
to limit the total time spent on this work and the amount of total memory 
required for handling the data, so that one can use typical workstations usu- 
ally devoted to technical and scientific computing. In particular, we have 
used a system with 2 dual-core processors (3.2 GHz as clock frequency) and 
4 GByte of central memory. We have prefered the software Mathematica, 
version 5.2, for the general numerics and its internal function NDSolve for 
the numerical resolution of differential equations. 
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The function NDSolve (see [8] for details) has the option Method for 
specify the desired numerical scheme and the option StartingStepSize for 
specify the initial grid step for the independent variable. This step is then 
automatically adapted by an internal procedure if necessary. We have chosen 
for Method the Adams- Moulton or Backward Differentiation schema for their 
reliability when used with nonlinearity as uii type (see e.g. [9] for technical 
informations and [10J for some examples). The application of NDSolve re- 
quired the specification of the initial values u and u respectively for velocity 
and acceleration along a streamline. The numerical resolution of associated 
algebraic systems is then executed using advanced methods based on Lapack 
technology (see [TT]). 

Version 5.2 of the software Mathematica provides support for multicore com- 
putations, spreading a single massive operation into a suitable number of 
parallel threads among the available processors (see [32] )• In particular the 
numerical linear algebra operations seem to have the major benefits from 
multicore support. Therefore we have tested these features for the matrix- 
vector products for the computation of all the coefficients of the cubics in 
the case of complex shape streamlines, and for the numerical resolution of 
the nonlinear differential equations, where the use of implicit discretization 
schema involves matrix computation and resolution of many systems of al- 
gebraic equations. Also, Mathematica has the function BlockMatrix of the 
package MatrixManipulation for construction and manipulation of block 
diagonal matrix A from block T, and the multiplication of sparse-dense 
matrices like Ab is highly optimized (see [13]). 

In Mathematica all the calculations are executed by a central logical unit 
called kernel, which distributes parallel threads of a single computation 
among the processors or cores available. So we have measured the per- 
formances of a single kernel spreading computations on 4 cores (2 physical 
processors), simulating a shared- memory system. As many kernels can run 
indipendently on the same machine, we have also tested the performances of 
2 kernels running computations on 2 physical processors, but in this case us- 
ing suitable procedures we have previously equi-distributed, among the two 
units, the data to handle. The two kernels have independently addressed 
their own memory allocation, simulating a distributed-memory system. 
For comparison, we have executed the same computations on a single one- 
core processor (3.2 GHz) machine with 4 GByte of memory. 
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Table 1: Speedup for computing cubics coefficients, 100 couples of points for each of M stream- 
lines. Right column shows speedup results in the case of 2 kernels with 1 dual processor for each 
of them (distributed programs), while central column shows results in the case of 1 kernel with 
2 dual processors (multicore program). Multicore single program presents better performances 
when amount of data increases. 
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6 About performances 

The Mathematica internal function AbsoluteTiming gives the total elapsed 
time spent by the kernel for the execution of a single task. Therefore this 
time amount is the sum of the time spent by CPUs and the time spent by 
program (and operating system) for handling data among threads. For ex- 
ample, the command AbsoluteTiming [A*b;] gives the total elapsed time of 
execution of all the necessary operations for the calculation of the product 
c=A*b, without front-end visualization of the result (note the use of ; in the 
expression), but with product vector c stored into memory. In this way, if 
a task is spreaded by the kernel among parallel threads, the output given 
by this built-in function is the effective total time spent for completion of 
calculations, and it's not the sum of the single times spent by single threads. 

For an evaluation of the performances of the multicore technology, we con- 
sider the notion of speedup S = that is the ratio between the elapsed 
time spent by a single processor session and the elapsed time spent by a 
multicore session for the execution of the same task. 

Now we discuss the results obtained for interpolation of streamlines in com- 
plex geometry (fig. 0|) and for resolution of corresponding flow differential 
equations. 

In Table 1 we report the performances registered for the computation of 
M streamlines interpolated by cubics, in the case of 100 couples of points 
for each of them. Note the good performance of the multicore technology 
on single kernel case when one increases the number of streamlines, while 
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Table 2: Speedup for computing velocity field, N = 100 pairs of points for each of M streamlines. 
Right column shows speedup results in the case of 2 kernels with 1 dual processor for each of 
them (distributed programs), while central column shows results in the case of 1 kernel with 
2 dual processors (multicore program). Multicore single program has better performances than 
distributed parallel programs. 
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the other case offers a slower speedup due to time spent in spreading and 
collecting data among the two kernels. The efficiency shown by the single 
kernel method when the size of the numerical parallel task increases is an 
experimental proof of the so called Gustafson generalization of Amdahl Law 
(see e.g. [14"]). 

When solving differential equations, if N is the number of couples of points 
used for interpolation, the number of ODEs for every velocity component is 
N too, because the cubics coefficients are not the same for two consecutive 
pairs. So the computed numerical value of the solution at the end point of 
the k-th pair is the initial value for computing the solution of the (k + 1)- 
th pair. In this way, using a built-in function like NDSolve, the numerical 
computation of the solution along a streamline becomes an essentialy serial 
task. For evaluating the effect of multicore Mathematica technology in the 
case of vectorial operations, we have tabulated all the coefficients and initial 
values in a iV dimensional array inputs and applied NDSolve to it using a 
vectorial single instruction like Map [modif iedNDSolve , inputs] , being Map a 
built-in optimized iterator and modif iedNDSolve a suitable modified version 
of NDSolve which accepts as input the cubics coefficients. This algorithm 
is mapped to the array of all the M streamlines, assigning at each step as 
initial values for u and u the end values computed in the previous step. 
In particular, for N = 1 we have the case of a priori assigned streamlines 
family, as that of the flow in gears pump. 

Table 2 reports the performances for the numerical resolution of ODEs. The 
speedups are smaller than those of the previous case, probably for the greater 
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Figure 5: A turbulent flow streamline in the phase of oil suction. The parameterized equation 
is of type s i — > ([ai + a,2s]sin(s) + 03s, [ai + a,2s]cos(s), a^s). 



percentage of not parallelizable operations required by the used algorithm, 
but again the performances are better for the multithreads single program. 
Note that the virtual distributed-memory environment fails when the size 
of the problem reaches a critical value, while the virtual shared-memory one 
can complete the task, probably for a better management of memory and 
centrally cached informations. Note also that the elapsed time necessary for 
a full calculation of velocity field by a single kernel on a single processor 
has a range, depending on M, variable from few seconds to tens of minutes, 
therefore for great sizes or large physical domains the speedup of a multicore 
computation could help in a significant way. 

7 Conclusions 

From the discussion about the used algorithms and registered performances, 
we can deduce the following conclusions on the parallel resolution, in a 
multicore environment, of blocks of ordinary differential equations: 

• the multicore technology gives better speedups in the case of a shared- 
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memory environment with a single central multithreads program which 
executes all the computations; 

• on a multicore machine, from the last line of table ([U]) we deduce that 
shared-memory single kernel resolution can execute the computations 
on a greater amount of data than distributed-memory multi-kernel 
resolution, which requires a greater allocation of memory; probably 
this fact depends on the use, made by shared-memory method, of a 
central caching of temporary data instead of a distributed one; 

• mathematical computations on large amount of scientific or technical 
data can take advantage from the use of numerical software specifi- 
cally designed for a central multicore architecture, as already shown 
in previous very particular experiences (see e.g. [15]): 

• the streamlines algorithm and a multicore computation can help to 
speedup the search, in a suitable set of possible flows, of particular 
solutions of physical and technical interest; for example, in the case 
of a small gradient of pressure we have numerically tested, from a set 
of possible pressure fields, the existence of a solution of Navier-Stokes 
equations along a given turbulence streamline like that of Fig. (5), 
which has importance for the prediction of cavitation in pump. 
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